Automatic generation of German pronunciation variants

نویسنده

  • Maria-Barbara Wesenick
چکیده

The subject of this paper is a rule corpus of approx.1500 phonetic rules that models segmental variation of pronunciation in German connected speech. The phonetic rules express on a broad-phonetic level phenomena of phonetic reduction in German that occur within words and across word boundaries The rule corpus has been designed as a component of the Munich AUtomatic Segmentation System (MAUS), which is an HMMbased system that produces the transcription of a speech signal and corresponding segment boundaries given the orthographic representation of the concerning utterance (refer to Kipp et al. [2] for details). The fact that speech is highly variable has been taken into account using the rules to complement the statistical modelling of German speech sounds and constrain the Viterbisearch. In this paper first a short introduction to the phenomenon of variability of speech and our approach of dealing with this problem in a technical application is presented. This is followed by a formal description of the syntax of the rules and the inventory of symbols that is used. Finally, I give an outline of reduction phenomena in German and how they are represented in the phonetic rules. 1. THE REPRESENTATION OF SEGMENTAL VARIATION IN GERMAN IN PHONETIC RULES A fundamental property of speech is that it is highly variable. No two utterances of the same word are ever produced exactly the same. Variability concerns the production of the same utterance of different speakers as well as the repeated production of an utterance by a single speaker (intervs. intra-speaker variability). Variability of speech depends among other factors on the immediate communicative situation, on the speechrate, speaking style and complexity of the semantic contents of utterances. Variability becomes apparent in the different realizations of a planned utterance. These can lie in a range from very clear, slow and precise to strongly reduced and fast. This scale is known as the hyper-hypo continuum of speech (Lindblom [4]). Endeavouring to be intelligible and easy to understand for a listener speakers attempt on the one hand to speak clearly and with precise articulation. This conflicts on the other hand with the general tendency to keep the articulatory effort as low as possible. In order to compromise speakers permanently adjust their performance by taking into account the amount of information the listener can obtain from the communicative situation and the context of an utterance (system-oriented factors). Depending on the amount of information of the system-oriented factors the information that is contained in the speech signal itself (outputoriented factors) need to be more or less explicit. Hence, the expected variation of the utterances along the continuum of hyperand hypospeech. For many speech processing tasks a representation of the pronunciation of the language concerned is required which is usually taken from common pronunciation dictionaries. The main problem with this is that dictionaries mostly give only one possible form of pronunciation which is usually not the most common form. In speech technology and especially in the field of speech recognition the variability of utterances is difficult to deal with. A way to handle it is to grasp it in statistical word models. But if it is necessary to refer to smaller units on a phonemic or broad-phonetic level one has to take into account knowledge about phonetic processes that lead to the variability, because free phonemerecognition has not been satisfactory yet. Concrete, complex and consistent information about possible variation in pronunciation is required. In segment-based speech recognition applications it is indispensable to process as much information about variation in pronunciation as possible for the analysis of the multifold input of human speech and the development of reliable systems in the area of speech technology. The rule system is an attempt to grasp the different pronunciation forms of an utterance within the hyper-hypo continuum on a symbolic level taking into account articulatory processes. The citation form, that reflects the phonemic structure of a carefully pronounced single word, serves as a reference form from which all other hypothesized pronunciation forms can by derived by symbolic rules. Thus the rules describe in an abstract way segmental differences between the reference form and the pronunciation form that results form reduction. 4th International Conference on Spoken Language Processing (ICSLP 96) Philadelphia, PA, USA October 3-6, 1996 ISCA Archive http://www.isca-speech.org/archive

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regional Pronunciation Variants for Automatic Segmentation

The goal of this paper is to create an extended rule corpus with approximately 2300 phonetic rules which model segmental variation of regional variants of German. The phonetic rules express at a broad-phonetic level phenomena of phonetic reduction in German that occurs within words and across word boundaries. In order to get an improvement in automatic segmentation of regional speech variants, ...

متن کامل

Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training

This paper presents a mispronunciation detection system which uses automatic speech recognition to support computer-aided pronunciation training (CAPT). Our methodology extends a model pronunciation lexicon with possible phonetic mispronunciations that may appear in learners’ speech. Generation of these pronunciation variants was previously achieved by means of phone-tophone mapping rules deriv...

متن کامل

Automatic rule-based generation of word pronunciation networks

In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. T...

متن کامل

German regional variants - a problem for automatic speech recognition?

A well known problem in automatic speech recognition (ASR) is robustness against the variability of speech between speakers. There are several ways to normalise different speakers; one of them is to deal with the problem of regional variation. In this paper we discuss the problem of whether moderate regional variants of German in uence the automatic speech recognition process and whether there ...

متن کامل

Regional Variants of German: Categories of Pronunciation Deviation from Standard German

This analysis describes categories of pronunciation variants we found in the transcription of monologues recorded for the RVG1 corpus (Regional Variants of German). Our results indicate that transcriptions on orthographic level provide useful information on regional variations of standard German. The pronunciation variants can be categorized into assimilation, enclitics, and types of single pho...

متن کامل

Generation and Selection of Pronunciation Variants for a Flexible Word Recognizer

This paper presents an approach for the generation and selection of pronunciation transcriptions for a exible word recognizer. The basic idea is to produce pronunciation variants and corresponding scores with a set of pronunciation variation rules, which are weighted with their frequencies of occurence measured on the training data. This approach addresses the problem of interfering transcripti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996